âIn February 2021, the state of Texas suffered a major power crisis, which came about as a result of three severe winter storms sweeping across the United States on February 10â11, 13â17, and 15â20.â1 For more background, check out these engineering and political perspectives.
For this assignment, you are tasked with:
- estimating the number of homes in Houston that lost power as a result
of the first two storms
- investigating if socioeconomic factors are predictors of communities
recovery from a power outage
Your analysis will be based on remotely-sensed night lights data, acquired from the Visible Infrared Imaging Radiometer Suite (VIIRS) onboard the Suomi satellite. In particular, you will use the VNP46A1 to detect differences in night lights before and after the storm to identify areas that lost electric power.
To determine the number of homes that lost power, you link (spatially join) these areas with OpenStreetMap data on buildings and roads.
To investigate potential socioeconomic factors that influenced recovery, you will link your analysis with data from the US Census Bureau.
Use NASAâs Worldview to explore the data around the day of the storm. There are several days with too much cloud cover to be useful, but 2021-02-07 and 2021-02-16 provide two clear, contrasting images to visualize the extent of the power outage in Texas.
VIIRS data is distributed through NASAâs Level-1 and Atmospheric Archive & Distribution System Distributed Active Archive Center (LAADS DAAC). Many NASA Earth data products are distributed in 10x10 degree tiles in sinusoidal equal-area projection. Tiles are identified by their horizontal and vertical position in the grid. Houston lies on the border of tiles h08v05 and h08v06. We therefore need to download two tiles per date.
As youâre learning in EDS 220, accessing, downloading, and preparing
remote sensing data is a skill in itâs own right! To prevent this
assignment from being a large data wrangling challenge, we have
downloaded and prepped the following files for you to work with, stored
in the VNP46A1 folder.
VNP46A1.A2021038.h08v05.001.2021039064328.h5.tif: tile
h08v05, collected on 2021-02-07VNP46A1.A2021038.h08v06.001.2021039064329.h5.tif: tile
h08v06, collected on 2021-02-07VNP46A1.A2021047.h08v05.001.2021048091106.h5.tif: tile
h08v05, collected on 2021-02-16VNP46A1.A2021047.h08v06.001.2021048091105.h5.tif: tile
h08v06, collected on 2021-02-16Typically highways account for a large portion of the night lights observable from space (see Googleâs Earth at Night). To minimize falsely identifying areas with reduced traffic as areas without power, we will ignore areas near highways.
OpenStreetMap (OSM)
is a collaborative project which creates publicly available geographic
data of the world. Ingesting this data into a database where it can be
subsetted and processed is a large undertaking. Fortunately, third party
companies redistribute OSM data. We used Geofabrikâs download sites to
retrieve a shapefile of all highways in Texas and prepared a Geopackage
(.gpkg file) containing just the subset of roads that
intersect the Houston metropolitan area.Â
gis_osm_roads_free_1.gpkgWe can also obtain building data from OpenStreetMap. We again
downloaded from Geofabrick and prepared a GeoPackage containing only
houses in the Houston metropolitan area.
gis_osm_buildings_a_free_1.gpkgWe cannot readily get socioeconomic information for every home, so
instead we obtained data from the U.S. Census Bureauâs
American Community Survey for census tracts in 2019. The
folder ACS_2019_5YR_TRACT_48.gdb is an ArcGIS âfile
geodatabaseâ, a multi-file proprietary format thatâs roughly
analogous to a GeoPackage file.
You can use st_layers() to explore the contents of the
geodatabase. Each layer contains a subset of the fields documents in the
ACS
metadata.
The geodatabase contains a layer holding the geometry information,
separate from the layers holding the ACS attributes. You have to combine
the geometry with the attributes to get a feature layer that
sf can use.
Below is an outline of the steps you should consider taking to achieve the assignment tasks.
For improved computational efficiency and easier interoperability
with sf, I recommend using the stars package
for raster handling.
stars object for each date
(2021-02-07 and 2021-02-16)st_mosaiclibrary(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(sf)
## Linking to GEOS 3.10.2, GDAL 3.4.2, PROJ 8.2.1; sf_use_s2() is TRUE
library(stars)
## Loading required package: abind
library(tmap)
library(raster)
## Loading required package: sp
##
## Attaching package: 'raster'
## The following object is masked from 'package:dplyr':
##
## select
library(terra)
## terra 1.6.17
library(tmap)
library(ggplot2)
#reading in the tile data and storing it as raster files
nl_feb07_t1<- read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021038.h08v05.001.2021039064328.tif')
nl_feb07_t2 <- read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021038.h08v06.001.2021039064329.tif')
nl_feb16_t1 <-read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021047.h08v05.001.2021048091106.tif')
nl_feb16_t2 <- read_stars('/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/VNP46A1/VNP46A1.A2021047.h08v06.001.2021048091105.tif')
# combing the data into single stars objects for each day
feb16_tile <- st_mosaic(nl_feb16_t1, nl_feb16_t2)
feb07_tile <- st_mosaic(nl_feb07_t1, nl_feb07_t2)
find the change in night lights intensity (presumably) caused by the storm
reclassify the difference raster, assuming that any location that
experienced a drop of more than 200 nW cm-2sr-1
experienced a blackout
assign NA to all locations that experienced a drop
of less than 200 nW cm-2sr-1
# adding an indicator of the attributes in the data
feb_16_tile_names = setNames(feb16_tile, "light_16")
feb_07_tile_names = setNames(feb07_tile, "light_07")
# matrix alegbra to calculate the difference light difference between the two dates
blackout_dif <- feb_07_tile_names - feb_16_tile_names
# #filtering for the differences of a drop less that 200 nW cm-2sr-1 as NA
blackout_mask <- cut(blackout_dif, c(200, Inf), labels = "outage")
------------------------
st_as_sf() to vectorize the blackout maskst_make_valid# vectorizing the blackout mask and fixing any invalid geometries
blackout_mask_v <- st_as_sf(blackout_mask) |>
st_make_valid()
-------------------------
st_polygonst_sfc() and assign a CRS# creating points for the Houston polygon
hou_point1 <- st_point(c(-96.5, 29))
hou_point2 <- st_point(c(-96.5, 30))
hou_point3 <- st_point(c(-94.5, 30))
hou_point4 <- st_point(c(-94.5, 29))
# defining Houston's geometric region
hou_def <- st_sfc(list(hou_point1, hou_point2, hou_point3, hou_point4), crs = 'EPSG:4326')
# creating a polygon of Houston's coordinates
hou_border <- st_polygon(list(rbind(c(-96.5,29), c(-96.5,30.5), c(-94.5, 30.5), c(-94.5,29), c(-96.5,29))))
# converting to an sf object and identifying the coordinate reference system
hou_border_sf <- st_sfc(hou_border, crs = 'EPSG:4326')
# cropping the blackout mask with the Houston polygon
hou_outage_mask_v <- blackout_mask_v[hou_border_sf, ,]
# reprojectting the cropped object to a new crs and converting it as an sf object
hou_outage_mask_v_3083 <- st_transform(hou_outage_mask_v, crs = 'EPSG:3083')
outage_mask_clean <- st_as_sf(hou_outage_mask_v_3083)
The roads geopackage includes data on roads other than highways.
However, we can avoid reading in data we donât need by taking advantage
of st_readâs ability to subset using a SQL query.
st_readst_bufferst_buffer produces undissolved buffers, use
st_union to dissolve themquery <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"
highways <- st_read("data/gis_osm_roads_free_1.gpkg", query = query)
# reading in highway data using SQL query and st_read()
query <- "SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'"
highways <- st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_roads_free_1.gpkg", query = query)
## Reading query `SELECT * FROM gis_osm_roads_free_1 WHERE fclass='motorway'' from data source `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_roads_free_1.gpkg'
## using driver `GPKG'
## Simple feature collection with 6085 features and 10 fields
## Geometry type: LINESTRING
## Dimension: XY
## Bounding box: xmin: -96.50429 ymin: 29.00174 xmax: -94.39619 ymax: 30.50886
## Geodetic CRS: WGS 84
# selecting the highway geometry data
highways_geom <- highways$geom
# transforming the highway geometries to the consistent crs
highways_geom <- st_transform(highways_geom, crs = 'EPSG:3083')
# creating a buffer zone for highways geometry data of 200 meters
highway_buffer <- st_buffer(x = highways_geom, dist = 200)
highway_buffer <- st_transform(highway_buffer, crs = 'EPSG:3083')
# combining the geometries into one and creating a mask that excludes the highway data
highway_buffer <- st_union(highway_buffer, by_feature = FALSE)
mask_hou_highway <- outage_mask_clean[highway_buffer, , op = st_disjoint]
st_read and the following
SQL query to select only residential buildingsSELECT *Â
FROM gis_osm_buildings_a_free_1
WHERE (type IS NULL AND name IS NULL)
OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')
# defining the query for the buildings
query_houses <- "SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')"
# reading in the highways data with SQL
houses <- st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_buildings_a_free_1.gpkg", query = query_houses)
## Reading query `SELECT * FROM gis_osm_buildings_a_free_1 WHERE (type IS NULL AND name IS NULL) OR type in ('residential', 'apartments', 'house', 'static_caravan', 'detached')' from data source `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/gis_osm_buildings_a_free_1.gpkg'
## using driver `GPKG'
## Simple feature collection with 475941 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -96.50055 ymin: 29.00344 xmax: -94.53285 ymax: 30.50393
## Geodetic CRS: WGS 84
#filtering for blackout region
class(houses)
## [1] "sf" "data.frame"
houses <- st_transform(houses, crs = 'EPSG:3083')
houses_st <- st_as_sf(houses)
class(mask_hou_highway)
## [1] "sf" "data.frame"
# filtering
outage_houses <- houses_st[mask_hou_highway, drop = FALSE]
nrow(outage_houses)
## [1] 138652
print(paste0("There were ", nrow(outage_houses), " homes affected by the power outage on Feburary 16, 2021."))
## [1] "There were 138652 homes affected by the power outage on Feburary 16, 2021."
st_read() to load the geodatabase layersACS_2019_5YR_TRACT_48_TEXAS layerX19_INCOME layerB19013e1#reading in geometry data
census_geom <- st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer = "ACS_2019_5YR_TRACT_48_TEXAS")
## Reading layer `ACS_2019_5YR_TRACT_48_TEXAS' from data source
## `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb'
## using driver `OpenFileGDB'
## Simple feature collection with 5265 features and 15 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -106.6456 ymin: 25.83716 xmax: -93.50804 ymax: 36.5007
## Geodetic CRS: NAD83
census_geom <- st_transform(census_geom, crs = 'EPSG:3083')
#reading in income data
income_median <- st_read("/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb", layer = "X19_INCOME")
## Reading layer `X19_INCOME' from data source
## `/Users/colleenmccamy/Documents/MEDS/EDS_223_Spatial_Data/data/assignment3/ACS_2019_5YR_TRACT_48_TEXAS.gdb'
## using driver `OpenFileGDB'
## Warning: no simple feature geometries present: returning a data.frame or tbl_df
#selecting for my coloums
income_med_select <- income_median |>
dplyr::select("GEOID", "B19013e1") |>
rename(GEOID_Data = GEOID, median_income = B19013e1)
# identifying the class of both objects to merge
class(census_geom)
## [1] "sf" "data.frame"
class(income_med_select)
## [1] "data.frame"
# changing the income object to a data_frame
income_med_select_df <- tibble(income_med_select)
# joining census geometries and median income data
census_data <- left_join(census_geom,
income_med_select,
by = "GEOID_Data")
# transforming both objects to the correct crs
census_data <- st_transform(census_data, crs = 'EPSG:4326')
outage_houses <- st_transform(outage_houses, crs = 'EPSG:4326')
# ensuring that they are both sf objects
class(census_data)
## [1] "sf" "data.frame"
class(outage_houses)
## [1] "sf" "data.frame"
# filtering the census data using the outage houses and adding column indicating that these census tracts were part of a blackout
census_outage <- sf::st_filter(census_data, outage_houses) |>
mutate(blackout = 'yes')
Results & Limitations:
After identifying the average median income for homes in the Houston metropolitan area that experienced a blackout during Texasâs 2021 energy crisis, the average median income for homes that experienced a blackout was $71,435 and was higher for the average median income for homes that didnât experience a blackout at $64,494.
However, this study didnât account for the percentage of homes that fell in lower median income tracks versus the percentage of homes that fell in higher median income census tracks and thus weights all census tracks equally upon calculating the average median income. Further investigations could also group census tracks by income level and identify the percentage of impacted vs non-impacted homes for each income grouping to determine if lower median income levels were disproportionately affected compared to higher median income levels.
# transforming both objects to the crs 4326 to crop it
census_data <- st_transform(census_data, crs = 'EPSG:4326')
hou_border_sf <- st_transform(hou_border_sf, crs = 'EPSG:4326')
# cropping the census data with the Houston border for filtering
census_data_hou <- census_data[hou_border_sf, ,]
# transforming census data back to the EPSG:3083 crs
census_data_hou <- st_transform(census_data_hou, crs = 'EPSG:3083')
# selecting necessary columns for houston census data
census_data_hou <- census_data_hou |>
dplyr::select("NAMELSAD", "Shape", "median_income", "GEOID_Data")
# selecting necessary columns for outage data by census track
census_outage <- census_outage |>
dplyr::select("blackout", "GEOID_Data")
census_outage_map <- census_outage |>
dplyr::select("blackout")
# converting census outage data to a dataframe in order to join
census_outage_df <- as.data.frame(census_outage)
# joining census outage data and census data for all of Houston
census_map_data <- left_join(census_data_hou,
census_outage_df,
by = "GEOID_Data")
census_map_data <- census_map_data |>
dplyr::select('median_income', 'blackout')
tmap_mode("view")
## tmap mode set to interactive viewing
# mapping median income by census track and identifying outages by dots
tm_shape(census_map_data) +
tm_polygons(col = "median_income",
palette = c("#227c9d",
"#17c3b2",
"#ffcb77",
"#ffe2b3",
"#feb3b1",
"#fe6d73"),
textNA = "Missing Income Data",
colorNA = "#e4ebea",
title = "Median Income") +
tm_shape(census_outage_map) +
tm_dots(shape = 1,
title = 'blackout') +
tm_layout(main.title = "Houston Census Data by Income that Experienced A Power Outage",
legend.outside = TRUE,
main.title.size = 1
)